Predicting Human-Targeted Translation Edit Rate via Untrained Human Annotators
نویسندگان
چکیده
In the field of machine translation, automatic metrics have proven quite valuable in system development for tracking progress and measuring the impact of incremental changes. However, human judgment still plays a large role in the context of evaluating MT systems. For example, the GALE project uses humantargeted translation edit rate (HTER), wherein the MT output is scored against a post-edited version of itself (as opposed to being scored against an existing human reference). This poses a problem for MT researchers, since HTER is not an easy metric to calculate, and would require hiring and training human annotators to perform the editing task. In this work, we explore soliciting those edits from untrained human annotators, via the online service Amazon Mechanical Turk. We show that the collected data allows us to predict HTER-ranking of documents at a significantly higher level than the ranking obtained using automatic metrics.
منابع مشابه
Ranking Machine Translation Systems via Post-editing
In this paper we investigate ways in which information from the postediting of machine translations can be used to rank translation systems for quality. In addition to the commonly used edit distance between the raw translation and its edited version, we consider post-editing time and keystroke logging, since these can account not only for technical effort, but also cognitive effort. In this sy...
متن کاملA Radically Simple, Effective Annotation and Alignment Methodology for Semantic Frame Based SMT and MT Evaluation
We introduce a radically simple yet effective methodology for annotating and aligning semantic frames inexpensively using untrained lay annotators that is ideally suited for practical semantic SMT and evaluation applications. For example, recent work by Lo and Wu (2011) introduced MEANT and HMEANT, which are state-of-the-art metrics that evaluates translation meaning preservation via Propbank s...
متن کاملA Study of Translation Edit Rate with Targeted Human Annotation
We examine a new, intuitive measure for evaluating machine-translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments. Translation Edit Rate (TER) measures the amount of editing that a human would have to perform to change a system output so it exactly matches a reference translation. We show that the single-refere...
متن کاملCrowdsourcing Annotation for Machine Learning in Natural Language Processing Tasks
Human annotators are critical for creating the necessary datasets to train statistical learners, but annotation cost and limited access to qualified annotators forms a data bottleneck. In recent years, researchers have investigated overcoming this obstacle using crowdsourcing, which is the delegation of a particular task to a large group of untrained individuals rather than a select trained few...
متن کاملFast, Cheap, and Creative: Evaluating Translation Quality Using Amazon's Mechanical Turk
Manual evaluation of translation quality is generally thought to be excessively time consuming and expensive. We explore a fast and inexpensive way of doing it using Amazon’s Mechanical Turk to pay small sums to a large number of non-expert annotators. For $10 we redundantly recreate judgments from a WMT08 translation task. We find that when combined non-expert judgments have a high-level of ag...
متن کامل